Search CORE

186 research outputs found

Query Expansion of Zero-Hit Subject Searches: Using a Thesaurus in Conjunction with NLP Techniques

Author: A. Shiri
E.P. Lau
J. Greenberg
L. Hollink
L. Villén-Rueda
R. Mandala
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

The focus of our study is zero-hit queries in keyword subject searches and the effort of increasing recall in these cases by reformulating and, then, expanding the initial queries using an external source of knowledge, namely a thesaurus. To this end, the objectives of this study are twofold. First, we perform the mapping of query terms to the thesaurus terms. Second, we use the matched terms to expand the user’s initial query by taking advantage of the thesaurus relations and implementing natural language processing (NLP) techniques. We report on the overall procedure and elaborate on key points and considerations of each step of the process

E-LIS

Crossref

Evaluating the application of semantic inferencing rules to image annotation

Author: Hollink L.
Hunter J.
Little S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

Semantic annotation of digital objects within large multimedia collections is a difficult and challenging task. We describe a method for semi-automatic annotation of images and apply it to and evaluate it on images of pancreatic cells. By comparing the performance of this approach in the pancreatic cell domain with previous results in the fuel cell domain, we aim to determine characteristics of a domain which indicate that the method will or will not work in that domain. We conclude by describing the types of images and domains in which we can expect satisfactory results with this approach. Copyright 2005 ACM

Crossref

University of Queensland eSpace

Semantic Annotation for Retrieval of Visual Resources

Author: Hollink L.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/2006
Field of study

Beeldmateriaal speelt een steeds grotere rol in onze cultuur, maar ook in de wetenschap en in het onderwijs. Zoeken in grote collecties beeldmateriaal blijft echter een moeizaam proces. Het kost een eindgebruiker veel tijd en moeite om juist dat ene beeld te vinden. Daarom zijn er efficiënte zoekmethoden nodig om de groeiende collecties doorzoekbaar te maken en te houden. Laura Hollink onderzoekt de problemen bij het zoeken naar beeldmateriaal en de mogelijke oplossingen daarvoor, in drie uiteenlopende collecties: schilderijen, foto’s van organische cellen en nieuwsuitzendingen.Schreiber, A.T. [Promotor]Wielinga, B.J. [Promotor]Worring, M. [Copromotor

CiteSeerX

VU Research Portal

CWI's Institutional Repository

Exploring concept representations for concept drift detection

Author: Becher O.L. (Oliver)
Elliott D. (Desmond)
Hollink L. (Laura)
Publication venue
Publication date: 11/09/2017
Field of study

We present an approach to estimating concept drift in online news. Our method is to construct temporal concept vectors from topicannotated news articles, and to correlate the distance between the temporal concept vectors with edits to the Wikipedia entries of the concepts. We find improvements in the correlation when we split the news articles based on the amount of articles mentioning a concept, instead of calendar-based units of time

CWI's Institutional Repository

Learning Semantic Query Suggestions

Author: Bron M.
de Rijke M.
Hollink L.
Huurnink B.
Meij E.
Publication venue: Radboud Universiteit Nijmegen, Information Foraging Lab
Publication date: 01/01/2010
Field of study

International Migration, Integration and Social Cohesion online publications

A corpus of images and text in online news

Author: Bedjeti A. (Adriatik)
Elliott D. (Desmond)
Harmelen M. van
Hollink L. (Laura)
Publication venue
Publication date: 23/05/2016
Field of study

In recent years, several datasets have been released that include images and text, giving impulse to new methods that combine natural language processing and computer vision. However, there is a need for datasets of images in their natural textual context. The ION corpus contains 300K news articles published between August 2014 - 2015 in five online newspapers from two countries. The 1-year coverage over multiple publishers ensures a broad scope in terms of topics, image quality and editorial viewpoints. The corpus consists of JSON-LD files with the following data about each article: the original URL of the article on the news publisher’s website, the date of publication, the headline of the article, the URL of the image displayed with the article (if any), and the caption of that image. Neither the article text nor the images themselves are included in the corpus. Instead, the images are distributed as high-dimensional feature vectors extracted from a Convolutional Neural Network, anticipating their use in computer vision tasks. The article text is represented as a list of automatically generated entity and topic annotations in the form of Wikipedia/DBpedia pages. This facilitates the selection of subsets of the corpus for separate analysis or evaluation

CWI's Institutional Repository

Learning Semantic Query Suggestions

Author: Bron M.
de Rijke M.
Hollink L.
Huurnink B.
Meij E.
Publication venue: Radboud Universiteit Nijmegen, Information Foraging Lab
Publication date: 01/01/2010
Field of study

International Migration, Integration and Social Cohesion online publications

Bias in the analysis of multilingual legislative speech

Author: Aggelen A.E. (Astrid) van
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Publication venue
Publication date: 04/07/2017
Field of study

In this paper we investigate the application of natural language processing tools to the multilingual proceedings of the European Parliament. This work is part of a study in which we explore (1) how subcorpora in different languages may lead to different conclusions about the political landscape, (2) how to determine what a potential language-related bias originates from, and (3) to what extent we can limit or even prevent an unwanted language-bias

CWI's Institutional Repository

SWISH DataLab: A Web Interface for Data Exploration and Analysis

Author: Bogaard T. (Tessel)
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Wielemaker J. (Jan)
Publication venue
Publication date: 08/12/2017
Field of study

SWISH DataLab is a single integrated collaborative environment for data processing, exploration and analysis combining Prolog and R. The web interface makes it possible to share the data, the code of all processing steps and the results among researchers; and a versioning system facilitates reproducibility of the research at any chosen point. Using search logs from the National Library of the Netherlands combined with the collection content metadata, we demonstrate how to use SWISH DataLab for all stages of data analysis, using Prolog predicates, graph visualizations, and R

CWI's Institutional Repository

Metadata categorization for identifying search patterns in a digital library

Author: Bogaard T. (Tessel)
Hardman L. (Lynda)
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Wielemaker J. (Jan)
Publication venue: 'Emerald'
Publication date: 06/03/2019
Field of study

Purpose: For digital libraries, it is useful to understand how users search in a collection. Investigating search patterns can help them to improve the user interface, collection management and search algorithms. However, search patterns may vary widely in different parts of a collection. The purpose of this paper is to demonstrate how to identify these search patterns within a well-curated historical newspaper collection using the existing metadata.Design/methodology/approach: The authors analyzed search logs combined with metadata

CWI's Institutional Repository